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Information flow and entropy production on 
Bayesian networks 

Sosuke Ito* and Takahiro Sagawa^ 


In this article, we review a general theoretical framework of thermodynamics of 
information on the basis of Bayesian networks. This framework can describe a broad 
class of nonequilibrium dynamics of multiple interacting systems with complex infor¬ 
mation exchanges. For such situations, we discuss a generalization of the second law 
of thermodynamics including information contents. A key concept here is an informa¬ 
tional quantity called the transfer entropy, which describes the directional information 
transfer in stochastic dynamics. The generalized second law gives the fundamental 
lower bound of the entropy production in nonequilibrium dynamics, and sheds mod¬ 
ern light on the paradox of “Maxwell’s demon” that performs measurements and 
feedback control at the level of thermal fluctuations. 
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1 Introduction 

1.1 Background 

The second law of thermodynamics is one of the most fundamental laws in physics, 
which identihes the upper bound of the efficiency of heat engines [1] . The second law 
has been established in the nineteenth century, after numerous failed trials to invent 
a perpetual motion of the second kind. Today we realize that it is not possible; one 
can never extract a positive amount of work from a single heat bath in a cyclic way, 
or equivalently, the entropy of the whole universe never decreases. 

While thermodynamics has been formulated for macroscopic systems, thermody¬ 
namics of small systems has been developed over the last two decades. Imagine a 
single Brownian particle in water. The particle goes to thermal equilibrium in the 
absence of external driving, because water plays the role of a huge heat bath. In this 
case, even a single small particle can behave as a thermodynamic system. Moreover, 
if we drive the particle by applying a time-dependent external force, the particle goes 
far from equilibrium. Such a small stochastic system is an interesting playing held to 
investigate “stochastic thermodynamics” [2|l3]. which is a generalization of thermody¬ 
namics by including the role of thermal huctuations explicitly. We can show that, in 
small systems, the second law of thermodynamics can be violated stochastically, but 
is never violated on average. The probability of the violation of the second law can 
quantitatively be characterized by the huctuation theorem M, which is a promi¬ 
nent discovery in stochastic thermodynamics. From the huctuation theorem, we can 
reproduce the second law of thermodynamics on average. Stochastic thermodynam¬ 
ics is applicable not only to a simple Brownian particle HD], but also to much more 
complex systems such as RNA foldings and biological molecular motors na. 

More recently, stochastic thermodynamics has been extended to information pro¬ 
cessing processes na. The central idea is that one can utilize the information about 
thermal huctuations to control small thermodynamic systems. Such an idea dates 
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back to the thought experiment of “Maxwell’s demon” in the nineteenth century |15j . 
The demon can perform a measurement of the position and the velocity of each 
molecule, and manipulate it by utilizing the obtained measurement outcome. By 
doing so, the demon can apparently violate the second law of thermodynamics, by 
adiabatically decreasing the entropy. The demon has puzzled many physicist over a 
century [TBl - EU] , and it is now understood that the key to understand the consistency 
between the demon and the second law is the concept of information pTH2B] . and 
that the demon can be regarded as a feedback controller. 

The recent theoretical progress in this held has led to a unihed theory of in¬ 
formation and thermodynamics, which may be called information thermodynam¬ 
ics [THI23H39] . The thermodynamic quantities and information contents are treated 
on an equal footing in information thermodynamics. In particular, the second law of 
thermodynamics has been generalized by including an informational quantity called 
the mutual information. The demon is now regarded as a special setup in the general 
framework of information thermodynamics. The entropy of the whole universe does 
not decrease even in the presence of the demon, if we take into account the mutual 
information as a part of the total entropy. Information thermodynamics has recently 
been experimentally studied with a colloidal particle [lOHlS] and a single electron [H] . 

Furthermore, the general theory of information thermodynamics is not restricted 
to the conventional setup of Maxwell’s demon, but is applicable to a variety of dynam¬ 
ics with complex information exchanges. In particular, information thermodynamics 
is applicable to autonomous information processing |15II5S] . and is further applicable 
to sensory networks and biochemical signal transduction [5UII53] . Such complex and 
autonomous information processing can be formulated in a unihed way on the basis 
of Bayesian networks m, this is the main topic of this chapter. An informational 
quantity called the transfer entropy |23] , which represents the directional information 
transfer, is shown to play a signihcant role in the generalized second law of thermo¬ 
dynamics on Bayesian networks. 

1.2 Basic ideas of information thermodynamics 

Before proceeding to the main part of this chapter, we briehy sketch the basic idea 
of information thermodynamics. The simplest model of Maxwell’s demon is known 
as the Szilard engine na, which is shown in Fig. [T] We consider a single particle in a 
box with volume V that is in contact with a heat bath at temperature T. The time 
evolution of the Szilard engine is as follows, (i) The particle is in thermal equilibrium, 
and the position of the particle is uniformly distributed, (ii) We divide the box by 
inserting a barrier at the center of the box. (iii) The demon performs a measurement 
of the position of the particle, and hnds it in the left or right box with probability 1/2. 
The obtained information is one bit, or equivalently In 2 in the natural logarithm, (iv) 
If the particle is found in the left (right) box, then the demon slowly moves the barrier 
to the right (left) direction, which is feedback control depending on the measurement 
outcome. This process is assumed to be isothermal and quasi-static, (v) The partition 
is removed, and the particle returns to the initial equilibrium state. 
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In step (iv), the single-particle gas is isothermally expanded and a positive amonnt 
of work is extracted. The amonnt of the work can be calculated by using the equation 
of states of the single-particle ideal gas (i.e., pV = k-^T with /cb the Boltzmann 
constant): 

i*V 1^ rji 

' ^ -dV'= kBTln2. (1) 


'V/2 


V' 


This is obviously positive, while the entire process seems to be cyclic. The crucial 
point here is that the extracted work ksT In 2 is proportional to the obtained infor¬ 
mation In 2, which suggests the fundamental information-thermodynamics link. 


(i) 


(ii) 



Figure 1: Schematic of the Szilard engine. The demon obtains measurement outcome 
m = L (left) or m = R (right), corresponding to one bit (= In2) of information. The 
demon then extracts k^T In 2 of work by feedback control. 


1.3 Outline of this chapter 

In the following, we present an introduction to a theoretical framework of informa¬ 
tion thermodynamics on the basis of Bayesian networks. This chapter is organized 
as follows. In Sec. El we briefly review the basic properties of information contents: 
the Shannon entropy, the relative entropy, the mutual information, and the transfer 
entropy. In Sec. [3l we review stochastic thermodynamics by focusing on a simple case 
of Markovian dynamics. In particular, we discuss the concept of entropy production. 
In Sec. m we review the basic concepts and terminologies of Bayesian networks. In 
Sec. [5l we discuss the general theory of information thermodynamics on Bayesian 
networks, and derive the generalized second law of thermodynamics including the 
transfer entropy. In Sec. El we apply the general theory to special situations such as 
repeated measurements and feedback control. In particular, we discuss the relation¬ 
ship between our approach based on the transfer entropy and another approach based 
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on the dynamic information flow [55I - I5B] . In Sec. [TJ we summarize this chapter, and 
discuss the future prospects of information thermodynamics. 


2 Brief review of information contents 

In this section, we review the basic properties of several informational quantities. We 
hrst discuss various types of entropy: the Shannon entropy, the relative entropy, and 
the mutual information [211122]. We next discuss the transfer entropy that quantihes 
the directional information transfer [23] . 

2.1 Shannon entropy 

We hrst discuss the Shannon entropy, which characterizes the randomness of prob¬ 
ability variables. Let x be a probability variable with probability distribution p(x). 
We hrst dehne a quantity called the stochastic Shannon entropy: 

s(x) := — Inp(x), (2) 

which is large if p{x) is small. The ensemble average of s{x) over all x is equal to the 
Shannon entropy: 


(s(x)) := - ^p(x) \np{x). (3) 

X 

We note that (• • •) describes the ensemble average throughout this paper. Since 
0 < p{x) < 1, we have s(x) > 0, and therefore 

(s(x)) > 0. (4) 

Let y be another probability variable that has the joint probability distribution 
with X as p(x, y). The conditional probability of x under the condition of y is given by 
p{x\y) := p{x,y)/p{y), which is the Bayes rule. We dehne the stochastic conditional 
Shannon entropy as 

s{x\y) :=-\np{x\y), (5) 

whose ensemble average is the conditional Shannon entropy: 

(s(x||/)) = - ^p(x, y) \np{x\y). (6) 

^,y 

2.2 Relative entropy 

We next introduce the relative entropy (or the Kullback-Leibler divergence), which 
is a measure of the diherence of two probability distributions. We consider two 
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probability distributions p and q on the same probability variable x. The relative 
entropy between the probability distributions is dehned as 

^KL(p||g) := ^p(x)ln-— = '^p{x)[\Yip{x) -lng(x)]. (7) 

X 72 \ / ^ 

By introducing the stochastic relative entropy as 

dKL{p{x)\\q{x)) := \np{x) - \nq{x), ( 8 ) 

we write the relative entropy as 

DrUpU) = {dKL{p{x)\\q{x))). (9) 


The relative entropy is always nonnegative. To show this, we use the Jensen 
inequality [22] 

(—ln[g(a:)/p(x)]) > — ln(g(x)/p(x)), (10) 

which is a consequence of the concavity of the logarithmic function. We then have 


DKL{p\\q) > - In 


q{x) 

p{x) 


q{x) 


plx 

X ^ 

X 

= 0 , 


( 11 ) 


where we used — 1- DKh{,p{x)\\q{x)) = 0 if and only if q{x) = 

p{x). 

We can also show the nonnegativity of the relative entropy in a slightly different 
way as follows. We hrst note that 

l^^-dKL(p{x)\\q{x))'^ _ ^]^ 2 ) 


because 


(e 


-dKL(p(a:)||g(x))\ _ 


»)^( 


q{x) 




(13) 


.P{x)/ ^ p(x) 

By applying the Jensen inequality to the exponential function that is convex, we have 


{exp{-dKMx)\\q{x)))) > exp(-(dKL(p(2:)||g(2:))))- (14) 


Therefore, we obtain 


1 > exp(-DKL(p|l9)), 


(15) 


which implies the nonnegativity of the relative entropy. We note that this proof is 
closely related to the fluctuation theorem as shown in Sec. O 
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2.3 Mutual information 


We discuss the mutual information between two probability variables x and y, which 
is an informational measure of correlation [2111^ . The stochastic mutual information 
between x and y is dehned as 


I{x : y) := 


V{x,y) 

p{x)p{y) 


lnp(a;, y) — Inp(x) — lnp(?/), 


(16) 


which can be rewritten as the stochastic relative entropy between p(x, y) and p{x)p{y): 


I{x:y) = dKMx,y)\\p{x)p{y)). 


(17) 


Its ensemble average is the mutual information: 

{I{x : y)) = '^p{x,y)\n = {dKL{p{x,y)\\p{x)p{y))). (18) 

“ P{x)p{y) 

From the nonnegativity of the relative entropy, we have 


{I{x : y)) > 0. (19) 

The equality is achieved if and only if x and y are stochastically independent, i.e., 
P{x,y) =p{x)p{y). 

The mutual information can also be rewritten as the difference of the Shannon 
entropy: 

{I{x : y)) = {s{x)) + {s{y)) - {s{x,y)) 

= {s{x)) - {s{x\y)) (20) 

= (s(2/)) - (s(2/k))- 

From the nonnegativity of the conditional Shannon entropy, we hnd that the mutual 
information is bounded by the Shannon entropy: 

{I{x-.y)) <{s{x)), {I{x :y)) <{s{y)). (21) 

Figure [2] shows a Venn diagram that summarizes the relationship between the Shannon 
entropy and the mutual information. 

We can also dehne the stochastic conditional mutual information between x and 
y under the condition of another probability variable z as 

I{x : y\z) := In = d]^i^{p{x, y\z)\\p{x\z)p{y\z)). (22) 

P[x\z)p[y\z) 

Its ensemble average is the conditional mutual information: 

{I{x ■.y\z)) :=^p{x,y,z)\n (23) 

p[x\z)p[y\z) 

We have (/(x : y\z)) > 0, where the equality is achieved if and only if x and y are 
conditionally independent, i.e., p{x,y\z) = p{x\z)p{ii\z). 
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{s{x,y)) 



Figure 2: Venn diagram of the relationship between the Shannon entropy and the 
mutual information. 

2.4 Transfer entropy 

The directional information transfer between two stochastic systems can be charac¬ 
terized by an informational quantity called the transfer entropy [23]. We consider 
a sequence of two probability variables: {xi,X 2 , ■ ■ ■ ,XN,yi,y 2 , ■ ■ ■ ,yN)- Intuitively, 
the states of interacting two systems X and Y at time k {= 1,2, ■■■ , N) is given by 
{xk,yk)- The time evolution of the composite system is characterized by the transi¬ 
tion probability p(a:fc+i,|/fc+i|a:i, 1 / 1 , • • • ,Xk,yk), which is the probability of {xk+i,yk+i) 
under the condition of (xi, yi, - ■ ■ , Xk, Vk)- The joint probability of all the variables is 
given by 


N-l 

p{xi, ■■■ ,XN,yir-- ,yN) = Yl Pi^k+uyk+i\xi,yi, ■ ■ ■ ,Xk,yk) ■pixi,yi). (24) 

k=l 


We now consider the information transfer from system X to V during time k and 
k + 1. We dehne the stochastic transfer entropy as the stochastic conditional mutual 
information: 

/fc+i(X ^ Y) ■= I{{xi,--- ,Xk) : yk+i\yir-- ,1/fc) 

p(xi, • • • ,Xk,yk+i\yi, ■■■ ,yk) (25) 

p{xi, ■ ■ ■ ,Xk\yi, ■ ■ ■ ,yk)p{yk+i\yi, ■■■ ,yk)' 

Its ensemble average is the transfer entropy: 


{lt,{X^Y)) 

:= {I{{xi,--- ,Xk) : yk+i\yi,--- ,yk)) 


^ p{xi, ■■■ ,Xk,yi,--- , yk, yk+i) In 
,yk+i 


p{xi, • • • ,Xk,yk+i\yi, ■■■ ,yk) 
p{xi, ■ ■ ■ ,Xk\yi, ■ ■ ■ ,yk)p{yk+i\yi, ■■■ ,yk)' 


(26) 
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which represents the information about past trajectory {xi,X 2 , ■ ■ ■ ,Xk) of system X, 
which is newly obtained by system Y from time k to k + 1. While the mutual 
information is symmetric between two variables in general, the transfer entropy is 
asymmetric between two systems X and Y, as the transfer entropy represents the 
directional transfer of information. 

Equality can be rewritten as 


4+i(A' -» Y) = ,Xk) : ivi,--- ,Vk,yt+i)) - ,Xk) ■ (si,--- ,yt)), 

(27) 

because 

l{{xi, ■■■ ,Xk): yk+i\yi, ■■■ ,yk) 

^ p(xi, • • • ,Xk,yi,--- , yk, yk+i)p{yi, ■■■ ,yk) 

‘) ^ki yii''' 1 yk^viyii''' i yki yk+i) (28) 

_ p{xi,--- ,Xk,yir-- ,yk,yk+i) , p{xi,-■ ■ ,Xk,yir ■ ■ ,yk) 

p(xi, • • • , Xk)p{yu ■■■ ,yk, yk+i) p{xi, ■■■ , Xk)p{yi, ■■■ ,yk) 

= ■■■ ,Xk) ■■ (i/i, • • • ,|/fc,2/fc+i)) - /((xi, ■■■ ,Xk) ■■ (2/1, • • • ,2/fc))- 

Equality flTTj) clearly shows the meaning of the transfer entropy: the information 
about X newly obtained by Y. We note that Eq. fl25|l can also be rewritten by using 
the stochastic conditional Shannon entropy: 

/fc+i(W ^Y) = s{yk+i\yir-- ,yk) - s{yk+i\xi,-■ ■ ,Xk,yi,--- .yk)- (29) 

Therefore, {Il\i{X —)■ Y)) describes the reduction of the conditional Shannon entropy 
of yk+i due to the information gain about system X, which again confirms the meaning 
of the transfer entropy. 


3 Stochastic thermodynamics for Markovian dy¬ 
namics 

We review stochastic thermodynamics of Markovian dynamics [ 2113 ], which is a the¬ 
oretical framework to describe thermodynamic quantities such as the work, the heat 
and the entropy production, at the level of thermal fluctuations. In particular, we 
discuss the second law of thermodynamics and the fluctuation theorem gHa]. 

3.1 Setup 

We consider system X that stochastically evolves. We assume the physical situation 
that system X is attached to a single heat bath at inverse temperature (3 := (/cbE)“^, 
and that system X is driven by external control parameter A that describes, for 
example, the volume of the gas. We also assume that nonconservative force is not 
applied to system X for simplicity. Moreover, we assume that system X does not 
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include any odd variable that changes its sign with the time-reversal transformation 
(e.g., momentum). The generalization beyond these simplihcation is straightforward. 

Although real physical dynamics are continuous in time, our formulation in this 
chapter is discrete in time. Therefore, we discretize time as follows. Suppose that 
the real stochastic dynamics of system X is parameterized by continuous time t. We 
then focus on the state of system X only at discrete time 4 := kXt (A; = 1, 2, • • • , N), 
where At is a hnite time interval. In the following, we refer to time tk just as “time 
k." Let Xk be the state of system X at time k. 

We next assume that A takes a hxed value \k during time interval tk <t < t^+i. 
The value of A is changed from Xk to A^+i immediately before time t^+i (see also 
Fig. |3]). We here assume that the time evolution of A is predetermined independent 
of the state of X. 

Let p{xk\xk-i, ■ ■ ■ ,a;i) be the conditional probability of state Xk under the con¬ 
dition of past trajectory xi —)■ • • • —>■ xu-i- It is natural to assume that the con¬ 
ditional probability is determined by external parameter Xk that is hxed during 
time interval tk < t < tk+i] we can explicitly show the A^-dependence by writing 

p(^Xk\Xk—li • • • , 3^1, Xk)- 


-- External parameters 


Xn 

Xn-\ 

X3 

Xi 

Xi 


/c=l 2 3 4 - rV-l N 



> Time 


Figure 3: The discretization of the time evolution of the external parameter. 

We also assume that the correlation time of the heat bath in the continuous¬ 
time dynamics is much shorter than At. Under this assumption, the discretized time 
evolution xi —)■ 0:2 re at can be regarded as Markovian. We note that, if the 

continuous-time dynamics itself is Markovian, the discretized dynamics is obviously 
Markovian. From the Markovian assumption, we have 

p{xk\xk-i, ...,xi; Xk) = p{xk+i\xk; Xk), (30) 

which we sometimes write as, for simplicity of notation, 

p{xk\xk-i) ■■= p{xk+i\xk;Xk). (31) 

The joint probability distribution of {xi,X 2 , - ■ ■ ,xn) is then given by 

p{xi,X2,--- ,xn) := p{xn\xn-i) ■ ■ ■p{x3\x2)p{x2\xi)p{xi). (32) 
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To make the notation simpler, we define set X := {xi,X 2 , ■ ■ ■ ,X]y}, and denote 


p{X) := p{xi, X2, • • • , Xn)- (33) 

Strictly speaking, set {xi,X 2 ,-- - ,xn} is not the same as vector (xi,X 2 ,--- ,xn). 
However, we sometimes do not distinguish them by notations for the sake of simplicity. 

3.2 Energetics 

We now consider the energy change in system X, and discuss the first law of thermo¬ 
dynamics. Let E{xk'i ^k) be the energy (or the Hamiltonian) of system X at time tk, 
which depends external parameter Xk as well as state Xk- The energy change in sys¬ 
tem X is decomposed into two parts: the heat and the work. The heat is the energy 
change in X due to the stochastic change of the state of X induced by the heat bath, 
and the work is the energy change due to the change of external parameter A. We 
stress that the heat and the work are defined at the level of stochastic trajectories in 
stochastic thermodynamics [2]. 

The heat absorbed by system X from the heat bath during time interval tk <t < 
tk+i is given by 

Qk • E(^Xk-\-ij Afc) E{xki '^fc)) (34) 

which is a stochastic quantity due to the stochasticity of Xk and Xk+i- On the other 
hand, the work is performed at time k at which the external parameter is changed. 
The work performed on system X at time k is given by (see also Fig. |3]) 


fFfc . E(yXki ^k') E{xk) Afc—i). 


(35) 


which is also a stochastic quantity. 

The total heat absorbed by system X from time 1 to iV along trajectory (xi, X 2 , • • • , xn) 
is then given by 

N-l 

(36) 

k=l 


and the total work is given by 


N 

k=2 


(37) 


It is easy to check that the total heat and the work satisfy the first law of thermody¬ 
namics: 


AE = Q + W, 


(38) 


where 

AE := F/(x7V) A/v") — E(^xi, Ai) (39) 

is the total energy change. We note that Eq. fl38p is the first law at the level of 
individual trajectories. 
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3.3 Entropy production and fluctuation theorem 

We next consider the second law of thermodynamics. We start from the concept of 
the detailed balance, which is satished in the absence of any nonconservative force. 
The detailed balance is given by, from time /c to fc + 1, 

p{xk+i\xk] = pkixk\xk+i, (40) 

where pk{xk\xk+i; Xk) describes the “backward” transition probability from Xk+i to Xk 
under external parameter Xk- Equality fflOj) can also be written as, from the dehnition 
of heat fl5T)l , 

p{xk+i\xk; Xk) ^ ^-/3Qk 
p(^Xk\xk-\-i^ A/j) 

The detailed balance condition (I40|) implies that, if the external parameter is hxed 
at Xk and is not changed in time, the steady distribution of system X becomes the 
canonical distribution 

= (42) 

where F{Xk) ■= —In is the free energy. In fact, it is easy to check 

that 

'^p{xk+i\xk; Xk)peq{xk; Xk) = Peq{xk+i; Xk). (43) 

It is known that the expression of the detailed balance (1401) is valid for a much 
broader class of dynamics than the present setup. In fact, it is known that Eq. fl40|) 
is valid for Langevin dynamics even in the presence of nonconservative force [S]. 
Moreover, a slightly modihed form of Eq. fl4UD is valid for nonequilibrium dynamics 
with multiple heat baths at different temperatures [8]. Therefore, we regard Eq. (1401) 
as a starting point of the following argument. 

We now consider the entropy production, which is the sum of the entropy changes 
in system X and the heat bath. The stochastic entropy change in system X from 
time fc to A; + 1 is given by 


Asf := s(a;fc+i) - s(a;fc), (44) 

where s{xk) '■= — lnp{xk) is the stochastic Shannon entropy. The ensemble average 
of (144)1 gives the change in the Shannon entropy as (As^) := (s(a;fc+i)) — {s{xk)). The 
total stochastic entropy change in X from time 1 to A is given by 

N-l 

:='^As^ = s{xn) - s{xi), (45) 

k=l 


which is also written as 


As^ = In 


pjxi) 

p{xn)' 


(46) 
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The stochastic entropy change in the heat bath is identihed with the heat dissi¬ 
pation into the bath [5]: 

:= -l3Qu. (47) 

From Eq. m, Eq. (gZD can also be rewritten as 


A bath _ 1 pj^k+ll^k] ^k) 

‘ p{^k\xh+i; \k)' 


(48) 


The total stochastic entropy change in the heat bath from time 1 to iV is then given 
by 

N-l 

^^bath _ = -/3Q, (49) 

k=l 

which can be rewritten as 

^^bath ^ p{xn\xn-i; Ajv_i) ■ ■ ■p{x3\x2; X2)p{x2\xi; Ai) 
p{xi\x2-, Xl)p{x2\x3-, X 2 ) ■ ■ -pixN-llxN] Aw-l) ’ 


The total stochastic entropy production of system X and the heat bath from time 
A; to fc -|- 1 is then dehned as 


<r» - Asf + As“. (51) 

and that from time 1 to is dehned as 

a := (52) 

The entropy production (a) is dehned as the average of a, where (• ■ ■) denotes the 
ensemble average over probability distribution p{X). From Eqs. 0461) and 05UI) . we 
obtain 

^ p{xn\xn-i, AjV-i) • • ■p{x3\x2-, X2)p{x2\xi, Ai)p(xi) 
p{xi\x 2 -, Ai)p(x 2 |x 3 ; X 2 ) ■ ■ -pixN-ilxN] Xn-i)p{xn) ’ 

which is sometimes referred to as the detailed huctuation theorem [8]. 

We discuss the meaning of the probability distributions in the right-hand side of 
Eq. fl5^ . First, we recall that the probability distribution of X is given by 


p(X) := p{xn\xn-i; Aat.i) • ■ ■p{x3\x2; X2)p{x2\xi, Xi)p{xi), (54) 

which describes the probability of trajectory xi —)■ X 2 —)■•••—)■ with the time 
evolution of the external parameter Ai —)■ A 2 —)■■■■ Aat. On the other hand, 

Pb{X) := p{xi\x2; Ai)p(x2|x3; A2) • • •p(xAr_i|xAr; Xn-i)p{xn) (55) 

is regarded as the probability of the “backward” trajectory xn ^ xn-i ^ xi 

starting from the initial distribution p{xn)^ where the time evolution of the external 
prarameter is also time-reversed as Aat ^ Xn-i ^ Xi. In other words, Pb(T’) 
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describes the probability of the time-reversal of the original dynamics. To emphasize 
this, we introduced suffix “B” in Pb{‘^) that represents “backward.” We also write 

PB{xk-i\xk) := p{xk-i\xk;Xk-i)- (56) 


We again stress that Pb(T’) is different from the original probability p{X), but de¬ 
scribes the probability of the time-reversed trajectory with the time-reversed time 
evolution of the external parameter. By using notations fIM)) and (1551) . Eq. fl55D can 
be written in a simplihed way: 


(j = In 


pjX) 

Pb(X)' 


(57) 


In Eqs. and fl57D . the entropy production is determined by the ratio of the 
probabilities of a trajectory and its time-reversal. This implies that the entropy 
production is a measure of irreversibility. 

We consider the second law of thermodynamics, which states that the average 
entropy production is nonnegative: 


(a) > 0. (58) 

This is a straightforward consequence of the dehnition of a as shown below. We hrst 
note that Eq. fl57)l can be rewritten by using the stochastic relative entropy dehned 
in Eq. (jH]): 

(^ = dKL{p{X)\\PB{X))- (59) 

By taking the ensemble average of dKL{p{X)\\PB{X)) by the probability distribution 
p{X), we hnd that (a) is equal to the relative entropy between p{X) and pb{X): 

{a) = (dKL(p(T)llpB(T))) =: D{p\\pb), (60) 

which is nonnegative and implies inequality fl58|l . 

The second law fl58|l can be shown in another way as follows. We hrst show that 

(exp(-cT)) = 1, (61) 

because 

Equality (1^ is called the integral huctuation theorem [7119]. By applying the Jensen 
inequality, we obtain 


(exp(-cT)) > exp(-(cT)), (63) 

which, along with Eq. fl6T|l . leads to the second law fl58|l . We note that Eq. fl6T]l can 
be regarded as a special case of Eq. ffT9]) . and the above proof of inequality fl55]) is 
parallel to the argument below Eq. fll2p . 
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We next consider the physical meaning of the entropy production for a special case, 
and relate the entropy production to the work and the free energy. Suppose that the 
initial and the hnal probability distributions are given by the canonical distributions 
such that p{xi) = peq(a^i;Ai) and p{xn) = Peq{xN', ^n)- In this case, the stochastic 
Shannon entropy change is given by 

=\n^^4^^^^^ = -/3{AF-AE), (64) 

Peq{XN', A at) 

where AF := F{X^) — F{Xi) is the free energy change and AE := E{xn]Xn) — 
E{xi] Ai) is the energy change. Therefore, the stochastic entropy production is given 
by 

a = As^ -/3Q =/3{-AF + AE-Q). (65) 

By using the hrst law of thermodynamics fl38l) . we obtain 

a = I3{W - AF). (66) 

Equality fl66|) gives the energetic interpretation of the entropy production for transi¬ 
tions between equilibrium states. In this case, the integral fluctuation theorem fl6T|) 
reduces to 

(67) 

which is called the Jarzynski equality [7]. It can also be shown that Eq. fl67)l is still 
valid even when the hnal distribution is out of equilibrium [7]. The second law of 
thermodynamics fl58|) then reduces to 

{W) > AF, (68) 

which is a well-known energetic expression of the second law; the free energy increase 
cannot be larger than the performed work. 


4 Bayesian networks 

In this section, we review the basic concepts of Bayesian networks [64II69] . which 
represent causal structures of stochastic dynamics with directed acyclic graphs. 

We hrst dehne the directed acyclic graph (see also Fig. Hj). The directed graph 
Q := {V,£} is given by a hnite set of nodes V and a hnite set of directed edges £. 
We write the set of nodes as 


V = {ai,... ,aAr^}, (69) 

where aj is a node and iVy is the number of nodes. The set of directed edges £ is 
given by a subset of all ordered pairs of nodes in V: 

£ != {Oj —^ Cly|Oj,Clj' G V, Oj Cly}. (70) 
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Intuitively, V is the set of events, and their causal relationship is represented by 8. If 
{ttj —)• ay) G 8, we say that aj is a parent of ay (or equivalently, ay is a child of aj). 
We write as pa(aj) the set of parents of aj (see also Fig. [5]); 

pa(aj) := {ay|(ay —)■ aj) G 8}. (71) 


as) 

T ^ 

I : Node 

ir 

i ■ Edge 


Figure 4: Example of a simple directed acyclic graph Q = {V,8} with V = {ai, a 2 , as} 
and 8 = {ai —)■ a 2 , ai —)■ as}. 

A directed graph is called acyclic if 8 does not include any directed cyclic path. 
In other words, a directed graph is cyclic if there exists {j■, such 

that {uj —^ ayi), ayi) —^ ay 2 ),..., ayn—i) —y aj(n)^aj{n) —y C 8] otherwise, it is 

acyclic. The acyclic property implies that the causal structure does not include any 
“time loop.” If a directed graph is acyclic, we can dehne the concept of topological 
ordering. An ordering of V, written as (ai, a 2 ,..., ua?^), is called topological ordering, 
if Oj is not a parent of aji for j > j'. We then dehne the set of ancestors of aj by 
an(aj) := {aj_i,..., ai} (an(ai) := 0). We note that a topological ordering is not 
necessary unique. 

We show a simple example of a directed acyclic graph Q = {V,8} with V = 
{ai,a 2 ,a 3 } and 8 = {ai a 2 ,ai as} in Fig. |U A node is described by a circle 
with variable Uj, and a directed edge is described by a directed arrow between two 
nodes. In Fig. 01 the sets of parents are given by pa(ai) = 0, pa(a 2 ) = {ai} and 
pa(a 3 ) = {ai}, where 0 denotes the empty set. In this case, we have two topological 
orderings: {ai,a 2 ,a 3 } and {ai,a 3 ,a 2 }. 

We next consider a probability distribution on a directed acyclic graph Q = {V, £^}, 
which is a key concept for Bayesian networks. A directed edge Oj —)■ Oji G on a 
Bayesian network represents the probabilistic dependence (i.e., causal relationship) 
between two nodes Oj and ay. Therefore, variable aj only depends on its parents 
pa(aj). The causal relationship can be described by the conditional probability of Oj 
under the condition of pa(aj), written as p(aj|pa(aj)). If pa(aj) = 0, p{aj) := p(aj|0) 
is just the probability of ay The joint probability distribution of all the nodes in a 
Bayesian network is then dehned as 

A/V 

p(v) := Ylp{aj\pa{aj)), (72) 

i=i 

which implies that the probability of a node is only determined by its parents. This 
dehnition represents the causal structure of Bayesian networks; the cause of a node 
is given by its parents. 
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Figure 5: Schematic of the parents of aj. The set of the parents, pa(aj), is dehned as 
the set of the nodes that have directed edges toward aj. This hgure also illustrates 
the setup of Eq. flSU]) . 


In Fig. [6l we show two simple examples of Bayesian networks. For Fig. [6] (a), the 
joint distribution is given by 

^(01,02,03) := p(a3|a2)p(a2|ai)p(ai), ( 73 ) 

which describes a simple Markovian process. Figure |6](b) is a little less trivial, whose 
joint distribution is given by 


p(ai, 02, •••,06)= p(06|0i, 04, 05)p(05|03)p(04|02)p(03|0i)p(02|0i)p(0i). ( 74 ) 

For any subset of nodes M C V, the probability distribution on A is given by 

p(A) = Y,p(V). (75) 

V\^ 

For A, A! C V, the joint probability distribution is given by 

p(M,M')= p(V). ( 76 ) 

V\(.4U^') 

The conditional probability is then given by the Bayes rule: 


p{A\A!) 


p{A') 


( 77 ) 


17 














Figure 6: Simple examples of Bayesian networks. 

Let A[V) be a probability variable that depends on nodes in V. The ensemble average 
of ^(V) is dehned as 

(-4(V)):=^>l(V)p(V), (78) 

V 

In particular, if A depends only on ^ C V, Eq. fl78|) reduces to 

(7l(.4)) := 5 ; A(A)p(A) = Y. MA)p{V). (79) 

A V 

We note that p(aj|an(aj)) = p{aj\pa.{aj)) holds by definition, which implies that 
any probability variable directly depends on the nearest ancestors (i.e., parents). This 
is consistent with the description of directed acyclic graphs. In general, we have 

p{aj\pa{aj), V) = p{aj\pa{aj)) (80) 

for any V C {an(aj) \ pa(aj)} (see also Fig. E]). 

5 Information thermodynamics on Bayesian net¬ 

works 

We now discuss a general framework of stochastic thermodynamic for complex dynam¬ 
ics described by Bayesian networks isa, where system X is in contact with systems 
C in addition to the heat bath. In particular, we derive the generalized second law 
of thermodynamics, which states that the entropy production is bounded by an in¬ 
formational quantity that consists of the initial and final mutual between X and C, 
and the transfer entropy from X to C. 
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5.1 Setup 

First of all, we discuss how Bayesian networks represent the causal relationships in 
physical dynamics. We consider a situation that several physical systems interact 
with each other and stochastically evolve in time. A probability variable associated 
with a node, aj G V, represents a state of one of the systems at a particular time. We 
assume that the topological ordering (oi,..., otvv) describes the time ordering; the 
time of state aj should not be later than the time of state Uj+i. This assumption does 
not exclude a situation that aj and aj+i can be states of different systems at the same 
time. Each edge in £ describes the causal relationship between states of the systems 
at different times. Correspondingly, the conditional probability p{aj\pa.{aj)) charac¬ 
terizes the stochastic dynamics. The joint probability p{V) represents the probability 
of trajectories of the whole system. 

We focus on a particular system X, whose time evolution is described by a set of 
nodes. Let X := {xi,..., xn} C V be the set of nodes that describe states of X, and 
let (xi,... ,XAr) be the topological ordering of the elements of A, where we refer to 
the suffixes as “time.” A probability variable x^ in X describes the state of system 
X at time k. We assume that there is a causal relationship between x^ and x^+i such 
that 


Xfc G pa(xfc+i). (81) 

For simplicity, we also assume that 

pa(xfc+i) n A = {xfc}, (82) 

which does not exclude the situation that there are nodes in pa(xfc+i) outside of X 
(see Fig. [7]). 

We next consider the systems other than X, which we refer to as C. The states 
of C are given by the nodes in set C := V \ A (see also Fig. [7]). Let (ci, C 2 , • • • , cat/) 
be the topological ordering of C, where we again refer to the suffixes as “time.” A 
probability variable q describes the state of C at time 1. Since V = A U we can 
dehne an joint topological ordering of V as 

(Cl, . . . , C;(l),Xl, C;(l)+i, . . . ,C;(2),X2,C;(2)+i, ...... . C;(JV) , Xat , C;(JV), . . . ,CAr/), (83) 

where the ordering (ci,..., C;(i),..., C;( 2 ),..., cn') is the same as the ordering (ci, C 2 ,..., cn')- 
The joint probability distribution p{X,C) can be obtained from Eq. fl72|l : 

N N' 

p{X,C) = JJp(xfc|pa(xfc)) JJp(Q|pa(Q)), (84) 

k=l 1=1 

where the conditional probability p(xA;+i|pa(xfc+i)) represents the transition proba¬ 
bility of system X from time k to k + 1. We note that the dynamics in V can be 
non-Markovian due to the non-Markovian property of C. We summarize the notations 
in Table [U 
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Figure 7: Schematic of the physical setup of Bayesian networks. The time evolution of 
system X is given by the sequence of nodes X = {xi, • • • ,xn}, and the time evolution 
of C is given by C := V \ A” = {ci, • • • , c^'}- 

5.2 Information contents on Bayesian networks 

We consider information contents on Bayesian networks; the initial and the hnal 
mutual information between X and C, and the transfer entropy from X to C. 

We hrst consider the initial correlation of the dynamics. The initial state Xi of 
X is initially correlated with its parents pa(xi) C C. The initial correlation between 
system X and C is then characterized by the mutual information between xi and 
pa(xi). The corresponding stochastic mutual information is given by (see also Fig. [8] 

(a)) 

Jini :=/(xi : pa(xi)). (85) 

Its ensemble average (/ini) is the mutual information of the initial correlation. It 
vanishes if and only if p(xi|pa(xi)) = p(xi), or equivalently, pa(xi) = 0. 

We next consider the hnal correlation of the dynamics. The hnal state of X is 
given by xn G A, which is correlated with its ancestors an(xAr). The hnal correlation 
between system X and C is then characterized by the mutual information between 
xn and C := an(xAr) flC. The corresponding stochastic mutual information is given 
by (see also Fig. [S] (b)) 


/fin :=/(xjv : C'). (86) 

Its ensemble average (/fin) is the mutual information of the hnal correlation. It van¬ 
ishes if and only if p(xjv|C') = p(xjv). 
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Notation 

Meaning 

pa(a) (Parents of a) 

Set of nodes that have causal relationship to node a. 

an(a) (Ancestors of a) 

Set of nodes before node a in the topological ordering. 

Xk 

State of system X at time k. 

A := {xi, ■ ■ ■ ,Xn} 

Set of states of system X. 

c '■= {ci, • • • , Cjsfi} 

Set of states of other systems C. 

C := an(a;Ar) fl C 

Set of the ancestors of in C. 

pax{ci) := pa(Q) fl A 

Set of the parents of q in X. 

pac(Q) := pa(Q) flC 

Set of the parents of q in C. 

Bk+i := pa(xfe+i) \ {xk] 

Set of parents of x^+i outside of X. 

Jini ■= I{xi : pa(a;i)) 

Initial mutual information between X and C. 

IT ■= I{ci ■ paA:(Q)|ci,-- - ,cz i) 

Transfer entropy from X to C/. 

rtr . rtr 

■— 2^1 cec 

Total transfer entropy from X to C. 

hn ■= I{xn '■ C) 

Final mutual information between X and C. 

0 := - /i„i 

— (0) is the available information about X obtained by C. 

cr 

Stochastic entropy change in X and the heat bath. 


Table 1: Summary of notations. 


We next consider the transfer entropy from X to C* during the dynamics. The 
transfer entropy on Bayesian networks has been discussed in Ref. m- We here focus 
on the role of the transfer entropy on Bayesian networks in terms of information 
thermodynamics. 

Let Q G C. Let pa_:^^(Q) := pa(Q) fl X be the set of the parents of q in X, and 
pac(Q) := pa(Q) flC be the set of the parents of q in C (see also Fig. |8] (c)). We note 
that pa.;^{ci) U = pa(Q) and pa.^{ci) fl = 0. We then have 

p{ci\pa{ci)) =p(Q|pa;i.(Q),pac(Q)) 

= p(qIpw(q)xi,---,q-i), 

where we used Eq. flsnjl with V' = {ci, • • • , q_i} \ pac(Q). 

The transfer entropy from system X to state q is dehned as the conditional 
mutual information between q and pa;^;.(Q) under the condition of {ci,..., Q_i}. The 
corresponding stochastic transfer entropy is given by 

If := I{ci : pa^{ci)\ci ,... ,Q_i) 

p(cz|ci, • • • ,Q_i)p(pa;^.(Q)|ci, • • • ,Q_i) (88) 

^ ^^ lnp(Q|pa;^(Q),ci,--- ,Q-i) 
p(q|ci, • • • ,Q_i) 

It can also be rewritten by using the conditional stochastic Shannon entropy: 

If = s(q|ci, • • • , Q_i) - s{ci\pax{ci), Cl, • • • , q_i), (89) 
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which is analogous to Eq. fl29p . The ensemble average of is the transfer entropy 
which describes the amount of information about X that is newly obtained 
by C at time 1. is nonnegative from the definition, and is zero if and only if 

\np{ci\psip^{ci), Cl, • • • , Q_i) = lnp(Q|ci, • • • , q_i), or equivalently pa;t-(Q) = 0. The 
total transfer entropy from X to C during the dynamics from xi to xn is then given 
by 

/•':= ^ y. (90) 

I : ci<£C' 

By summing up the foregoing information contents, we introduce a key informa¬ 
tional quantity 0: 


0 := /i,i, (91) 

which plays a crucial role in the generalized second law that will be discussed in Sec. [5l 
Here, the minus of the ensemble average of 0 (i.e., —(0)) characterizes the available 
information about X obtained by C during the dynamics from xi to x^ (see also Fig. 

E). 


(a) 


(b) 


(c) 



Figure 8: Schematics of informational quantities on Bayesian networks, (a) The initial 
correlation between xi and pa(a;i). (b) The final correlation between xjq and C. (c) 
The transfer entropy from from X to C;. 


5.3 Entropy production 

We next define the entropy production that is defined as the sum of the entropy 
changes in system X and the heat bath. While the key idea of the definition is 
the same as the case for the Markovian dynamics discussed in the previous section, 
a careful argument is necessary for the entropy production on Bayesian networks, 
because of the presence of other systems C. 
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We consider the subset of probability variables in C (i.e., nodes in C) that affect 
the time evolution of X from time A; to fc + 1, which is dehned as (see Fig. [H]) 

Ek+i := pa{xk+i) \ {xk] C C. (92) 

The transition probability of X from time A; to fc + 1 is then written as 

p(xfc+i|pa(a:fc+i)) = p{xk+i\xk,Bk+i). (93) 



Figure 9: Schematic of Bk+i- 

We note that p{xk+i\xk, Bk+i) describes the transition probability from Xk to Xk+i 
under the condition that the states of C that affect X are given by Bk+i- We dehne 
the functional form of p{xk+i\xk,Bk+i) with arguments {xk+i.XkiBk+i) by 

f{xk+i,Xk,Bk+i) ■.= p{xk+i\xk,Bk+i). (94) 

We then dehne the backward transition probability as 

PB{xk\xk+i,Bk+i) := f{xk,Xk+i,Bk+i), (95) 

which describes the transition probability from Xk+\ to Xk under the same condition 
Bk+i as the forward process. 

Here, PB{xk\xk+i, Bk+i) is different from the conditional probability p(a;A;|xfc+i, Bk+i) 
p{xk,Xk+i,Bk+i)/p{xk+i,Bk+i), which is obtained from the Bayes rule (1771) of the 
Bayesian network. To emphasize the difference, we used the suffix “B” that repre¬ 
sents “backward.” We note that pB{xk\xk+i, Bk+i) is analogous to p{xk\xk+i, Xk) in 
Eq. (I5B]) of Sec. [21 by replacing Xk by Bk+i- In fact, in many situations, we can 
assume that external parameter Xk is determined by Bk+i; a typical case is feedback 
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control as will be discussed in Sec. 16.21 and 16.31 We also note that the backward 
probability PB{.Xk\xk+i, ^k+i) can be dehned even in the presence of odd variables like 
momentum, by slightly modifying dehnition fl95|) . 

We then dehne the entropy change in the heat bath from time fc to /c + 1 in the 
form of Eq. fH8l) : 


^^bath _ v{Xk+l\Xk,Bk+l) 

PB\Xk\Xk+l-i Bk+l) 

We note that can be identihed with —f3Qk in many situations. In fact, as 

mentioned above, if Bk+i affects Xk only through the external parameter, Eq. (1^ is 
equivalent to fH8|) . In such a case, we can show that = —(3Qk as discussed in 

Sec. O The entropy change in the heat bath from time 1 to A is then given by 

N-l 

Agbath _ 

k=i (97) 

_ p{xn\xn-1, Bn) ■ ■ ■p{x3\x2, B3)p{x2\xi, B 2 ) 

Pb{xi\x2, B2)pb{x2\x3, B3) ■ ■ ■ pb{xn-i\xN, Bn)' 

which is analogous to Eq. flbOjl . The total entropy change in X and the heat bath 
from time 1 to A is then dehned as 


a := As^ + 


which is also written as 

^ p{xn\xn-i, Bn) ■ ■ ■p{x3\x2, B3)p{x2\xi, B2)p{xi) 

PBiXi\x2, B 2 )Pb{x 2 \x 3 , B3) ■ ■ 'PBixN-llxN, Bn)p{xn) ' 


(98) 


(99) 


5.4 Generalized second law 

We now consider the relationship between the second law of thermodynamics and 
informational quantities. The lower bound of the entropy change in system X and 
the heat bath is given by (0): 

W) > (e>. (100) 

or equivalently, 

{a) > ihn) - in - {hni), ( 101 ) 

which is the generalized second law of thermodynamics on Bayesian networks. 

The proof of the generalized second law fllOOp is as follows. We hrst show that 
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cr — 0 can be rewritten as the stochastic relative entropy: 


(T — 0 = In 


P(^i) p{xk+i\xk,Bk+i 


p{^n) PB{Xk\Xk+l, Bk+i 

n 


p{xi\pa{xi)) ^ 


p{xi) 


_ p{xn,C') 
p{xn)p{C') 

p{ci\pa{ci)) 


.1 : cjgC 


, P(q|Ci, . . . ,Q_i) 


nf=iP(a^fc|pa(a;fc)) III : c,eC'P(cz|pa(Q)) 


= In 


nr=i^ PB{Xk\Xk+l , Bk+l)p{xN, C') 


=hKL(p(V)||pB(V)), 

where we dehned 

Af-l 

Pb(V) :=p{xn,C') Y\_PB{xk\xk+i,B, 


k=l 


k+ 1 ) n p(cz|pa(cz)). 

l : ci^C',cieC 


( 102 ) 


(103) 


We can conhrm that Pb(V) is normalized, and can be regarded as a probability 
distribntion: 


N-l 

X]pb(V) = ^p{xn,C') YlpB{xk\xk+i,Bk+i) 

V X,C' k=l 

= P(^N,C') 

xn,C' 

= 1 , 


(104) 


where we nsed x^ G X, Bk+i C C', and Y,XkPBi^k\xk+i,Bk+i) = 1. From Eq. (11021) 
and the nonnegativity of the relative entropy, we show that the ensemble average of 
cr — 0 is nonnegative: 

(cr - 0) = -DKL(plbB) > 0, (105) 

which implies the generalized second law fllOOl) . The eqnality in fllOOp holds if and 
only if p(V) = Pb(V). 

We consider the integral fluctnation theorem corresponding to ineqnality fllOOp . 
From Eq. fll2p for the stochastic relative entropy, we have 

^g-dKL(p(V)||pB(V))^ _ (106) 


or eqnivalently, 

(g-<T+e) ^ 

This is the generalized integral flnctnation theorem for Bayesian networks. By apply¬ 
ing the Jensen ineqnality to Eq. (11071) . we again obtain ineqnality (11001) . 

We note that, from ineqnality (11001) and (/fin) > 0, we obtain a weaker bonnd of 
the entropy prodnction: 

(a) > -in - (/ini). (108) 
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This weaker inequality can also be rewritten as the nonnegativity of the relative 
entropy Dyj^{p\\P-q) > 0, where the probability Pb(V) is defined as 

N-l 

Pb{V) ■.= p{xN)p{C')\YPB{xk\xk+i,Bk+i) p(cz|pa(Q)). (109) 

k=l I : ci^C',cieC 

The corresponding integral fluctuation theorem is given by 

(e—= 1. (110) 


6 Examples 

In the following, we illustrate special examples, and discuss the physical meaning of 
the generalized second law fllOOp . 

6.1 Example 1: Markov chain 

As the simplest example, we revisit the Markovian dynamics discussed in Sec. [3] 
from the viewpoint of Bayesian networks. In this case, V = A = {xi,... ,XAr} and 
C = 0 . The Markovian property is characterized by pa(xfe) = {xk-i\ with k > 2, 
and pa(xi) = 0 (see also Fig. [TU]). Since Bk+i := pa(xfc+i) \ {xk} = 0 , the entropy 
production fl95D is equivalent to Eq. flS2H . From C = 0 , we have /fin = 0, Ani = 0, 
Ip = 0, and therefore 0 = 0. Therefore, the definition of a in Eq. fl9^ reduces to 
Eq. and the generalized second law fllOOp just reduces to (a) > 0. 

6.2 Example 2: Feedback control with a single measurement 

We consider the system under feedback control with a single measurement as is the 
case for the Szilard engine. In this case, system X is the measured system, and the 
other system C is a memory that stores the measurement outcome. 

At time / = 1, a measurement on state Xi is performed, and the obtained outcome 
is stored in memory state mi. The probability of outcome mi under the condition 
of state Xi is denoted by p(mi|xi), which characterizes the measurement error. If 
p{mi\xi) is the delta function the measurement is error-free. After the mea¬ 

surement, the time evolution of X is affected by mi such that the transition prob¬ 
ability of X from time fc to /c -|- 1 is given by p{xk+i\xk, mp {k = 1, 2, • • • , At — 1), 
which is feedback control. In terms of the physical interpretation discussed in Sec. [31 
the dynamics of system X is determined by the external parameter. In the presence 
of feedback control, the time evolution of the external parameter is determined by 
mi. The joint probability distribution of all the variables is then given by 

p(x7v, • • • , xi, mi) = p(xjv|xAr_i, mi) • ■ ■p(x2|xi, mi)p(mi|xi)p(xi). (Ill) 

The Bayesian network corresponding to the above dynamics is characterized as 
follows. Let X := {xi,...,XAr} be the set of the states of measured system X, 
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Example 3 


Example 1 



Example 2 




Figure 10: Bayesian networks of the examples. Example 1: Markov chain. Example 2: 
Feedback control with a single measurement. Example 3: Repeated feedback control 
with multiple adaptive measurements. 


C := {mi} be the memory state, and V := df UC = (xi,... ,XAr,mi} be the set of all 
notes. The causal structure described by Eq. fillip is given by pa(xfc) = {xA:-i,mi} 
for fc > 2, pa(mi) = {xi}, and pa(xi) = 0 (see also Fig. fTOjl . 

Since i3fc+i := pa(xfc+i) \ {x^} = {mi} for fc > 1, the entropy production fl^ in 
the heat bath from time /c to A; + 1 is given by 


A bath _ 1 p(Xfc+i|Xfc,mi) 

^ Pij(xfc|xfc+i,mi)' 


( 112 ) 


Considering the foregoing argument that p(xfe+i|xfc, mi) depends on mi through ex¬ 
ternal parameter A^, we can identify as the heat such that = —(3Qk- 

The total entropy production fl^S]l from time 1 to A is given by 


cr = In 


p(xfc+i|xfc,mi) 

Pi.^N) l}^PB{Xk\Xk+l,mi) 


(113) 


From pa(xi) = C = C = {mi}, and pa_;i>(mi) = {xi}, we have /gn = ■ mi), 

Jini = 0, = /(xi : mi), and therefore 


0 = /(xiv : mi) — J(xi : mi). 


(114) 


which is the difference between the initial and the hnal mutual information. Therefore, 
the generalized second law fllOOp reduces to 

(a) > (/(xat : mi)) - (/(xi : mi)). (115) 
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We note that inequality (11151) is equivalent to the generalized second law obtained in 

Refs. ISSUSH]. 

Several simple models that achieves the equality in inequality flllhp have been 
proposed [32l|33ll35]. In general, the equality in inequality flllhp is achieved if and 
only if a kind of reversibility with feedback control is satished [32]; the reversibility 
condition is given by 

N-l N-l 

JJ p{xk+i\xk, mi) ■ p{xi, mi) = JJ PBixk\xk+i, mi) ■ p{xn, mi). (116) 

k=l k=l 

The left-hand side of Eq. flllbp represents the probability of the forward trajectory 
with feedback control. The physical meaning of the right-hand side is as follows. 
Suppose that we start a backward process just after a forward process by keeping 
mi for each trajectory. In the backward process, we use outcome mi obtained in 
the forward process in order to determine the external parameter; we do not per¬ 
form feedback control in the backward process. The probability distribution of the 
backward trajectories is then given by the right-hand side of Eq. (11161) . 

We next consider a special case that the initial and hnal states of system X are 
in thermal equilibrium. The initial distribution is given by 

p{xi) = Peq{xi) := (-117) 

where F{1) is the initial free energy and E{xi] 1) is the initial Hamiltonian. Since the 
hnal Hamiltonian may depend on outcome mi due to the feedback control, the hnal 
distribution under the condition of mi is the conditional canonical distribution 

p{xN\mi) = Peq{xN\mi) := (xxg) 

Here, F{mi) is the hnal free energy and E{xi,mi) is the hnal Hamiltonian, both 
of which may depend on outcome mi. The generalized second law (I115p is then 
equivalent to 


/3{W-AF) > -(/(xi : mi)). 


( 119 ) 


where W is the work and AF := F{mi) — F{1) is the free-energy difference. We note 
that the ensemble average is needed for AF, because F{mi) is a stochastic quantity 
due to the stochasticity of mi. Inequality (I119p has been derived in Ref. [27|. The 
derivation of inequality (I119p from (I115p is as follows. We hrst note that 


s{xn) := -lnp(xAi) 


= — In 


p{xN\mi)- 


p{mi) 


p(x7v|mi) 
= s(x7v|mi) I{xn : mi). 


( 120 ) 


From Eqs. (11171) and (IllSp . we have 

s(xi) = -/3(F(1) - E{xi, 1)), s(xAi|mi) = -/?(F(mi) - E{xi,mi)). (121) 

























Therefore, we obtain 

s{xn) — s{xi) = —I3{AF — AE) + I{xn ■ rrii). (122) 

By substituting the ensemble average of Eq. 01221) to inequality 01151) . we obtain 

- (3{AF - AE) + {I{xn : mi)) - (Q) > {I{xn ■ mi)) - {I{xi : mi)). (123) 

By noting the first law AE = hh + Q, we hnd that inequality 0123p is equivalent to 
inequality 0119p . 

The simplest example of the present setup is the Szilard engine discussed in Sec. 11.21 
(see also Fig. [I]). In this case, the measurement is error-free and the outcome is 
mi = L or R with probability 1/2, and therefore {I{xi : mi)) = In 2. The hnal state 
is no longer correlated with mi such that (/(xat : mi)) = 0. The extracted work is 
— {W) = /5“^ln2, and the free-energy change is (AE) = 0. Therefore, for the Szilard 
engine, the both-hand sides of inequality 01191) is given by In 2, and the equality 
in 0119p is achieved. In this sense, the Szilard engine is an optimal information- 
thermodynamic engine. 

6.3 Example 3: Repeated feedback control with multiple 
measurements 

We consider the case of multiple measurements and feedback control. Let Xk be the 
state of system X at time k (= 1,..., iV). Suppose that the measurement outcome 
obtained at time k (= 1,... ,iV — 1), written as m*,, is affected by past trajectory 
{xi,X 2 , • • • , Xk) of system X. In other words, the measurement at time k is performed 
on trajectory (xi, 0 : 2 , • • • , Xk)- Moreover, we assume that outcome m^ is also affected 
by sequence (mi, • • • ,mfc_i) of the past measurement outcomes, which describes the 
situation that the way of measuring X is changed depending on the past measurement 
outcomes; such a measurement is called adaptive. The conditional probability of mk 
is then given by p{mk\xi, • • • , Xk-i, Xk,mi-- ■ , m^-i). 

Next, outcome rrik is used for feedback control after time fc, and the transition 
probability from Xk to Xk+i is written as p{xk+i\xk,mi, ■ ■ ■ ,mk-i,mk). In this case, 
we assume that external parameter Xk at time k is determined by memory states 
(mi,-- - ,mfc_i,mfc). The joint probability distribution of all the variables is then 
given by 


p{xi, - - - , a:AT, mi, - - - ,mN-i) 

N-l 

= JJ p{xk+i\xk, mi,..., mk)p{mk\xi, ■ ■ ■ , Xk,mi, ■ ■ ■ , mk-i) - p{xi). 

k=l 

If outcome ruk is affected only by Xk such that 


(124) 


p{mk\xi, - - - , Xfc, mi, - - - , mfc_i) = p{mk\xk), (125) 
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the measurement is Markovian and non-adaptive. If the transition probability from 
Xfc to Xk+i depends only on such that 

p{xk+i\xk,mi, • • • = p{xk+i\xk,mk), ( 126 ) 

the feedback control is called Markovian. On the other hand, if p{xk+i\xk-i rui, • • • , mk-i, rrik) 
depends on mi with I < k, the feedback control is called non-Markovian, which de¬ 
scribes the effect of time-delay of the feedback loop. 

The Bayesian network corresponding to the above dynamics is as follows. Let 
X := {xi, • • • jXat}, C := {mi, ■ ■ ■ ,m]\f}, and V := X U C. The causal structure is 
characterized by pa(xfc) = rui,..., for k > 2, pa(xi) = 0, pa(mfc) = 

{xi,--- ,Xk-i,Xk,mi, ■ ■ ■ ,mk} for k > 2, and pa(mi) = {xi}. Figure fTOl describes 
the Bayesian network of a special case that 


p{mk\xk, ■■■ , xi, mi, • • • , mfc-i) = p{mk\xk, m^-i) (127) 


and pa(mA;) = {xk,mk-i} for k > 2. 

Since Bk+i = {mi, • • • ,mk}, the entropy change fIMD in the heat bath from time 
A; to fc -|- 1 is given by 


^^bath ^ p(xfc+i|xfc,mi,...,mfc) 

^ PB{xk\xk+i,mi,...,mk)' 


(128) 


If we assume that p(xfc+i|xfc, mi, • • • ,mk) depends on (mi,-- - ,mk) only through 
external parameter Afc, the entropy change is identihed with the heat: = —fiQk- 

The total entropy production fl^SD from time 1 to A is dehned as 


cr = In 


p(xfc+i|xfc,mi,.. .,mk) 

p{^n) ^J^PB{xk\xk+i,rni,...,mk) 


(129) 


From pa(xi) = ^, C = C = {mi,..., mAr_i}, and pa;i-(mA;) = {xi, - - - , Xfc}, we 
have Jini = 0, /fin = I{xn ■ {mi,.. .,mN-i)), Ik = ■■■ ,Xk) : mk\mi,.. .,mk-i), 

and therefore 


N-l 

0 = I{xn : {mi,.. .,mN-i)) - ^ I{{xi, ■■■ ,Xk) : mk\mi,.. .,mk-i)- ( 130 ) 

i=i 

Therefore , the generalized second law fllOOl ) reduces to 

Af-l 

( cr ) > (/(xjv : ( mi ,... , mAr _ i ))) - ^(/(( xi , ■■■ ,Xk) : mk\mi,.. .,mk-i))- ( 131 ) 

fc=i 

We note that , in the special case illustrated in Fig . fITU ]) . we have pa.;i;{mk) = { x ^} 
and = I{xk : mfc | mi ,... ,mk-\)- Therefore , 0 in Eq . 01301 ) reduces to 

AT-l 

0 = I{xn : ( mi ,... , m7v _ i )) - ^ I{xk : mfc | mi ,... , mfc _ i ). ( 132 ) 

k=l 


30 











The equality in Eq. 
fied [32]: 


fll3ip holds if and only if the feedback reversibility is satis- 


N-l 

p{xk+i\xk,mi,... ,mk)p{mk\xi, ■■■ ,Xk,mi,--- ,mk-i) ■ p{xi) 

k=l 

N-l 

= pB{xk\xk+i, mi,, rrik) ■ p{xn, mi,..., mAr_i). 

k=l 


(133) 


The right-hand side of Eq. (11331) represents the probability distribution of the back¬ 
ward trajectories. In a backward process, any feedback control is not performed, and 
the external parameter is changed by using the measurement outcomes obtained in 
the corresponding forward process. 

If the initial and hnal states of system X are in thermal equilibrium such that 

p{xi) = Peq{xi) := andp(xiv|mi, • • • , m^v) = Peq(a:Ar|mi, • • • , itin) ■= 

,mN))^ inequality fll3ip reduces to, from a similar argument of 
the derivation of inequality flll9|) . 


N-l 

I3{W - AF) > - ^{/((xi, ■■■ ,Xk) : mfcimi,... ,mfc_i)), (134) 

k=l 

which has been obtained in Refs. [30l[35| for the case of non-adaptive measurements. 

6.4 Example 4: Markovian information exchanges 

We consider information exchanges between two interacting systems X and Y. Let Xk 
and yk be the states of system X and Y in time ordering A: = 1,..., iV. Suppose that 
the transition from Xk to Xk+i is affected by pk, and the transition from pk to pk+i 
is affected by Xk (see also Fig. [TT] (a)). This assumption implies that the interaction 
of X and Y is Markovian. During the dynamics, the transfer entropy from X to E 
and vice versa can be positive, and the mutual information between two systems can 
change. Therefore, such dynamics can describe Markovian information exchanges. 
In the continuous-time limit, such dynamics are called Markov jump processes of 
bipartite systems jSHES]- We note that “bipartite systems” do not mean bipartite 
graphs in the terminology of Bayesian networks. 

The joint probability distribution of all the variables is given by 


N-l 

p(xi,i/i,...,XAr,?/iv) = Ylp{yk+i\xk+i,yk)pixk+i\xk,yk) ■ p{yi\xi)p{xi). (135) 

k=l 

The transition probability of each step from {xk,yk) to {xk+i,yk+i) is given by 

p{xk+i,yk+i\xk,yk) = p{yk+i\xk+i,yk)p{xk+i\xk,yk), (136) 
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Example 5 



Figure 11: Bayesian networks of the examples. Example 4: Markovian information 
exchanges between two systems, (a) Entire dynamics, (b) A single transition. Ex¬ 
ample 5: Complex dynamics of three interacting systems. 
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and correspondingly, the joint probability of {xk^yk^Xk+i^Vk+i) is given by 

p{xk+i, Vk+i, Xk, Vk) = p{yk+i\xk+i, yk)p{xk+i\xk, yk)p{yk\xk)p{xk)- 


(137) 


First, we apply onr general argnment in Sec. H] to the entire dynamics 01351) 
illnstrated in Fig. [TT] (a). Let X = {xi,..., xat} be the set of the states of X, 
C = {j/i,..., t/at} be the set of the states of F, and V \= XUC = {xi, yi,..., xat, yj^} 
be the set of all states. The cansal strnctnre described by Eq. 0135p is given by 
pa(xfc) = {xfc_i,?/fc-i} for /c > 2, pa(|/fc) = {xk,yk-i} for fc > 2, pa(?/i) = {xj and 
pa(xi) = 0. 

Since Bk+i = the entropy change in the heat bath from time k to k + 1 
is given by 


A bath _ 1 P{.Xk+l\Xk-i yk) 

^ PB{Xk\Xk+l.Vk]' 


(138) 


The entropy prodnction fl^5]l from time 1 to X is then given by 


(T = In 


p(Xfc+i|Xfc,|/fc) 
p{^n) PB{xk\xk+i,yk) 


(139) 


From pa(xi) = 0 , C' = an(x7v) nC = {|/i,..., yN-i}, and pa;t.(?/fc) = {x^}, we have 
hni = 0 , /fin = I{xn ■ (l/i, • •. ,2/7 v-i)), /f = I{xk ■ l/fc|l/i, ■ ■ ■ , 1 /fc-i), and therefore 


N-l 

0 = /(xm : ( 2 / 1 ,... ,|/jv_i)) - '^li.Xk : 2/fc||/i, ■ ■ ■, 2/fc-i)- (140) 

i=i 

The generalized second law OIOOI) then reduces to 


N-l 

(a) > (0) = (/(x)v : (2/i,...,|/jv-i))) - '^{I{xk : 2/fc||/i, ■ ■ ■, 2/fc-i)). (141) 

k=l 

Next, we apply onr general argnment in Sec. IHonly to a single transition described 
by Eq. 01371) . which is illnstrated in Fig. [TT] (b). Let X = {xA;,Xfc+i} be the set of 
the states of X, C = {ykiyk+i} be the set of the states of F, and V \= X U C = 
{xkiykiXk+i^yk+i} be the set of all states. The cansal strnctnre described by Eq. 
([I37|) is given by pa(xfc+i) = {xk,yk}, pa(?/fc+i) = {x^+i, ?/*,}, pa(|/fc) = {x*,}, and 
pa(xfc) = 0. 

Since Bk+i = {yk}, the entropy change fl^ in the heat bath from time k to k + 1 
is eqnal to Eq. 0138p . The entropy prodnction of the single transition, written as cifc, 
is given by 




In 


p(xfc) p{xk+i\xk,yk) 

p{Xk+l) PB{Xk\Xk+\,yk) 


(142) 


Here, the snm is eqnal to the entire entropy prodnction a given in Eq. 0139p . 
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From pa(a;fc) = 0, C' = an(a;fc+i) flC = {yk}, and Y>^x{yk+i) = {xk}, we have 
Jini = 0, /fill = I{xk+i : Vk)) and = I{xk '■ Vk)- Denoting 0 for the single transition 
by 0fc, we obtain 

0fc = I{xk+i : Vk) - I{xk- Vk)- (143) 

Therefore, the generalized second law fllOOl) rednces to 

(o-fc) > (0fc) = {I{xk+i : Vk)) - {I{xk ■ Vk))- (144) 


By snmming np ineqnality fll44p for k = 1,2, ■ ■ ■ , N — 1, we obtain 

(a) > (0^), (145) 

where 

N-l 

0^ := Qk. (146) 

k=l 

Ineqnality fll45p gives another bonnd of the entire entropy prodnction (a). An infor¬ 
mational qnantity (0*^) is called the dynamic information flow, which has been stndied 
for the bipartite Markovian jnmp processes and conpled Langevin dynamics [531 - I58] . 

To snmmarize the foregoing argnment, we have shown two ineqnalities fll4ip and 
fll45p for the same dynamics described in Fig. [TT] (a). Ineqnality fll45p is obtained 
by snmming np ineqnality fll44p for k = 1,2, ■ ■ ■ , N — 1, where ineqnality fll44p 
is obtained by applying onr general ineqnality fllOOp only to the single transition 
illnstrated in Fig. [TT] (b). 

We now discnss the relationship of two ineqnalities (1141 p and fll45p . We can 
calcnlate the difference between (0'^) and (0) as 


(©") - ( 0 ) 


N-l 


^ [{lixk+i : Vk)) - (Hxk : Vk)) + {I{xk ■ VklVi, ■ ■ ■ ,2/fc-i))] - {I{^n ■ {Vu ■ ■ -.yN-i)) 


k=l 


N-l 


'"11 


p{xk+i,yk)p{xk,yi, ...,yk) 

p{xk,yk)p{xk+i, 2 / 1 ,, yk) 


k=2 

N-l 

= [(^(^k,{yi,---,yk-i)\yk)) - {I{xk+i,{yi,... ,yk-i)\yk))] 

k=2 

> 0 , 

where we used the data processing inequality 

(yi ,.... yt-i)\yt)) > (vi,.. ■ ,yt-i)\yk)), 

for the following conditional Markov chain: 


(147) 


(148) 


p{xk,Xk+i,yi ,... , 2 /fc-i| 2 /fc) = p{xk+i\xk,yk)p{xk\yi, ■ ■ ■ ,yk-i,yk)p{yi, ■ ■ -yyk-ilyk)- 

(149) 
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Therefore, we obtain 


(a) > ( 0 ") > ( 0 ), ( 150 ) 

which implies that the dynamic information flow (0'^) gives a tighter bound of the 
entire entropy production than ( 0 ). This hierarchy has been also shown in Ref. [ 56 ] 
for coupled Langevin dynamics. 

6.5 Example 5: Complex dynamics 

We consider three systems that interacts with each other as illustrated in Fig. [TT] In 
this case, V := {|/i, Xi, ^i, 0:2, ^2,1/2, ^3}, pa(|/i) = 0 , pa(xi) = pei{zi) = 

pa(a;2) = pa(z2) = {xi,Zi}, pa(|/2) = {yi,X2,Z2}, pa(a;3) = {0:2,1/2}, and 

pa(^3) = {a;2, Z2}. The joint probability of V is given by 

p(y) =p{z3\x2, Z2)pixs\x2, |/2)p(l/2|l/i, X 2 , Z2)piz2\xi, Zi)p{x2\xi, Zi)p{zi\yi)p{xi\yi)p{yi). 

( 151 ) 

We focus on system X with X := {xi,X2,xz}. The other systems are given by 
Y and Z, which constitute C with C = {ci = |/i,C2 = 2^1,03 = 2^2,04 = y2Xb = ^3}. 
Since B2 = {zi}, and B3 = {1/2}, the total entropy production fl^ is dehned as 

p{x3\x2,y2)p{x2\xi, Zi)p{xi) 

^ PB{Xi\Zi,X 2 )pB{x 2 \y 2 ,X 3 )p{x 3 )' 

From C = {yi,zi,Z2,y2}, pa(a;i) = {|/i}, pa^ivi) = 0 , Pa;r(^i) = 0 , V^x{z2) = 
{xi} and pax{y2) = {2^2}, we have = I{x3 : {yi, Zi, Z2,y2}), hni = I{xi : yi), 
= 0 , = 0 , II = /(xi : Z2\yi,zi), and = /(x2 : //2I//1, 2^1, 2^2)- The generalized 

second law fllOOp then reduces to 


(a) >(/(x 3 : {yi,Zi,Z 2 ,y 2 })) - (/(a^i : 2/i)) - {Hxi : Z 2 \yi,Zi)) - (/(x 2 : 1 / 2 I// 1 , ^i, ^ 2 ))- 

(153) 


7 Summary and prospects 

In this chapter, we have reviewed a general framework of information thermodynam¬ 
ics on the basis of Bayesian networks [52]. In our framework, Bayesian networks 
are used to graphically characterize stochastic dynamics of nonequilibrium thermo¬ 
dynamic systems. Each node of a Bayesian network describes a state of a physical 
system at a particular time, and each edge describes the causal relationship in the 
stochastic dynamics. A simple application of our framework is the setup of “Maxwell’s 
demon,” which performs measurements and feedback control, and can extract the 
work by using information. Moreover, our framework is not restricted to such simple 
measurement-feedback situations, but is applicable to a broad class of nonequilibrium 
dynamics with information exchanges. 

Our main result is the generalized second law of thermodynamics fllUUp . The 
entropy production (a), which is the sum of the entropy changes in system X and the 
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heat bath, is bounded by an informational quantity (0), which consists of the initial 
and hnal mutual information between system X and other systems C, and the transfer 
entropy from X to C* during the dynamics. A key ingredient here is the transfer 
entropy, which quantihes the directional information transfer from a stochastic system 
to another stochastic system. The physical meaning of the generalized second law 
is that the entropy reduction of system X is bounded by the available information 
about X obtained by C. We note that the generalized second law is derived as a 
consequence of the nonnegativity of the relative entropy as shown in fllOSp . and also 
as a consequence of the integral fluctuation theorem fll07p . We have also discussed 
the relationship between the generalized second law with the transfer entropy flldip 
and that with the dynamic information flow fll45p in Sec. 16.41 the latter second law 
is stronger. While we have focused on discrete-time dynamics in this chapter, we can 
also formulate continuous-time dynamics by Bayesian networks, where we assume 
that edges represent inhnitesimal transitions [5^163] . 

For the case of quantum systems, the effect of a single quantum measurement and 
feedback control has been studied, and the generalizations of the second law and the 
fluctuation theorem have been derived in the quantum regime [IMS]. However, the 
generalization of the formulation with Bayesian networks to the quantum regime has 
been elusive, which is a fundamental open problem. 

Potential applications of information thermodynamics beyond the conventional 
setup of Maxwell’s demon can be found in the hied of biophysics. In fact, there 
have been several works that analyze the adaptation process of living cells in terms 
of information thermodynamics |M1163] . For example, by applying the generalized 
second law to biological signal transduction of Escherichia coli {E. coli) chemotaxis, 
we found that the robustness of adaptation is quantitatively characterized by the 
transfer entropy inside a feedback loop of the signal transduction [63]. Moreover, 
it has been found that the E. coli chemotaxis is inefficient (dissipative) as a con¬ 
ventional thermodynamic engine, but is efficient as an information-thermodynamic 
engine. These results suggest that information thermodynamics is indeed useful to 
analyze autonomous information processing in biological systems. 

Another potential application of information thermodynamics would be machine 
learning, because neural networks perform stochastic information processing on com¬ 
plex networks. In fact, there has been an attempt to analyze neural networks in 
terms of information thermodynamics [20] • Moreover, information thermodynamics 
of neural information processing in brains would also be another fundamental open 
problem. 
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